Search CORE

56 research outputs found

ARC 2014 over-clocking KLT designs on FPGAs under process, voltage, and temperature variation

Author: Bouganis C-S
Duarte RP
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/11/2015
Field of study

Spiral - Imperial College Digital Repository

fpgaConvNet: A framework for mapping convolutional neural networks on FPGAs

Author: Bouganis C-S
Venieris SI
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2016
Field of study

Convolutional Neural Networks (ConvNets) are a powerful Deep Learning model, providing state-of-the-art accuracy to many emerging classification problems. However, ConvNet classification is a computationally heavy task, suffering from rapid complexity scaling. This paper presents fpgaConvNet, a novel domain-specific modelling framework together with an automated design methodology for the mapping of ConvNets onto reconfigurable FPGA-based platforms. By interpreting ConvNet classification as a streaming application, the proposed framework employs the Synchronous Dataflow (SDF) model of computation as its basis and proposes a set of transformations on the SDF graph that explore the performance-resource design space, while taking into account platform-specific resource constraints. A comparison with existing ConvNet FPGA works shows that the proposed fully-automated methodology yields hardware designs that improve the performance density by up to 1.62× and reach up to 90.75% of the raw performance of architectures that are hand-tuned for particular ConvNets

Crossref

Spiral - Imperial College Digital Repository

Population-Based MCMC on Multi-Core CPUs, GPUs and FPGAs

Author: Bouganis C-S
Mingas G
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/05/2015
Field of study

Spiral - Imperial College Digital Repository

ARC 2014: a multidimensional FPGA-based parallel DBSCAN architecture

Author: Bouganis C-S
Scicluna N
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/11/2015
Field of study

Spiral - Imperial College Digital Repository

StreamSVD: Low-rank approximation and streaming accelerator co-design

Author: Bouganis C-S
Yu Z
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2021
Field of study

The post-training compression of a Convolutional Neural Network (CNN) aims to produce Pareto-optimal designs on the accuracy-performance frontier when the access to training data is not possible. Low-rank approximation is one of the methods that is often utilised in such cases. However, existing work considers the low-rank approximation of the network and the optimisation of the hardware accelerator separately, leading to systems with sub-optimal performance. This work focuses on the efficient mapping of a CNN into an FPGA device, and presents StreamSVD, a model-accelerator co-design framework 1 . The framework considers simultaneously the compression of a CNN model through a hardware-aware low-rank approximation scheme, and the optimisation of the hardware accelerator's architecture by taking into account the approximation scheme's compute structure. Our results show that the co-designed StreamSVD outperforms existing work that utilises similar low-rank approximation schemes by providing better accuracy-throughput trade-off. The proposed framework also achieves competitive performance compared with other post-training compression methods, even outperforming them under certain cases

Spiral - Imperial College Digital Repository

Now that I can see, I can improve: Enabling data-driven finetuning of CNNs on the edge

Author: Bouganis C-S
Rajagopal A
Publication venue: 'Center for Open Science'
Publication date: 15/06/2020
Field of study

In today's world, a vast amount of data is being generated by edge devices that can be used as valuable training data to improve the performance of machine learning algorithms in terms of the achieved accuracy or to reduce the compute requirements of the model. However, due to user data privacy concerns as well as storage and communication bandwidth limitations, this data cannot be moved from the device to the data centre for further improvement of the model and subsequent deployment. As such there is a need for increased edge intelligence, where the deployed models can be fine-tuned on the edge, leading to improved accuracy and/or reducing the model's workload as well as its memory and power footprint. In the case of Convolutional Neural Networks (CNNs), both the weights of the network as well as its topology can be tuned to adapt to the data that it processes. This paper provides a first step towards enabling CNN finetuning on an edge device based on structured pruning. It explores the performance gains and costs of doing so and presents an extensible open-source framework that allows the deployment of such approaches on a wide range of network architectures and devices. The results show that on average, data-aware pruning with retraining can provide 10.2pp increased accuracy over a wide range of subsets, networks and pruning levels with a maximum improvement of 42.0pp over pruning and retraining in a manner agnostic to the data being processed by the network

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

Area-driven partial reconfiguration for SEU mitigation on SRAM-based FPGAs

Author: Bouganis C-S
Vavouras M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/11/2016
Field of study

This paper presents an area-driven Field-Programmable Gate Array (FPGA) scrubbing technique based on partial reconfiguration for Single Event Upset (SEU) mitigation. The proposed method is compared with existing techniques such as blind and on-demand scrubbing on a novel SEU mitigation framework implemented on the ZYNQ platform, supporting various SEU and scrubbing rates. A design space exploration on the availability versus data transfers from a Double Data Rate Type 3 (DDR3) memory, shows that our approach outperforms blind scrubbing for a range of availability values when a second order polynomial IP is targeted. A comparison to an existing on-demand scrubbing technique based on Dual Modular Redundancy (DMR) shows that our approach saves up to 46% area for the same case study

Crossref

Spiral - Imperial College Digital Repository

Semi-dense SLAM on an FPGA SoC

Author: Boikos K
Bouganis C-S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/08/2016
Field of study

Deploying advanced Simultaneous Localisation and Mapping, or SLAM, algorithms in autonomous low-power robotics will enable emerging new applications which require an accurate and information rich reconstruction of the environment. This has not been achieved so far because accuracy and dense 3D reconstruction come with a high computational complexity. This paper discusses custom hardware design on a novel platform for embedded SLAM, an FPGA-SoC, combining an embedded CPU and programmable logic on the same chip. The use of programmable logic, tightly integrated with an efficient multicore embedded CPU stands to provide an effective solution to this problem. In this work an average framerate of more than 4 frames/second for a resolution of 320×240 has been achieved with an estimated power of less than 1 Watt for the custom hardware. In comparison to the software-only version, running on a dual-core ARM processor, an acceleration of 2× has been achieved for LSD-SLAM, without any compromise in the quality of the result

Crossref

Spiral - Imperial College Digital Repository

Towards efficient on-board deployment of DNNs on intelligent autonomous systems

Author: Bouganis C-S
Kouris A
Venieris S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2019
Field of study

With their unprecedented performance in major AI tasks, deep neural networks (DNNs) have emerged as a primary building block in modern autonomous systems. Intelligent systems such as drones, mobile robots and driverless cars largely base their perception, planning and application-specific tasks on DNN models. Nevertheless, due to the nature of these applications, such systems require on-board local processing in order to retain their autonomy and meet latency and throughput constraints. In this respect, the large computational and memory demands of DNN workloads pose a significant barrier on their deployment on the resource-and power-constrained compute platforms that are available on-board. This paper presents an overview of recent methods and hardware architectures that address the system-level challenges of modern DNN-enabled autonomous systems at both the algorithmic and hardware design level. Spanning from latency-driven approximate computing techniques to high-throughput mixed-precision cascaded classifiers, the presented set of works paves the way for the on-board deployment of sophisticated DNN models on robots and autonomous systems

Crossref

Spiral - Imperial College Digital Repository

An Optimized Hardware Architecture of a Multivariate Gaussian Random Number Generator

Author: Bouganis C-S
Constantinides GA
Saiprasert C
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/12/2010
Field of study

Spiral - Imperial College Digital Repository